{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Provincial GDP by Industry"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook explores gross domestic product [data released by Statistics Canada on 2018-05-02](http://www.statcan.gc.ca/daily-quotidien/180502/dq180502a-eng.htm). The objective is to generate an overview of the 🇨🇦 economy, separated by provinces and industry sectors. The amounts are in `Chained (2007) dollars` (Millions).\n",
    "\n",
    "Specific Questions I want to understand:\n",
    "\n",
    "+ What are the top 10 industry sector line items in Canada? \n",
    "+ What are the top 10 industry sector line items in each province?\n",
    "\n",
    "> Author: [J. Colliander](http://colliand.com)  \n",
    "> Date: 2018-05-03\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## The data was downloaded from StatsCan.\n",
    "## The first 3 rows contain header information that we don't want in the data frame.\n",
    "## The last 28 rows contain footnotes and information about the data set. We don't want those in the frame.\n",
    "## skipfooter seemed to require changing the engine from default 'c' to 'python'.\n",
    "gitems = pd.read_csv('./data/cansim-3790030-eng-4789079444265787071.csv', skiprows = 3, skipfooter = 28, engine='python')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Throw away lines with internal Provincial and territorial totals.\n",
    "## There may be other internal totals included here.\n",
    "## TODO Review the footnotes and accompanying information on the data set.\n",
    "df = gitems.loc[~(gitems['North American Industry Classification System (NAICS)'].str.contains('All industries'))]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "years = ['2013', '2014','2015', '2016', '2017']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Total GDP vs Years\n",
    "for year in years:\n",
    "    print(year, df[year].sum())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['Geography'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## long column label.\n",
    "df = df.rename(columns={'North American Industry Classification System (NAICS)': 'NAICS'})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "## Industry sectors sorted by total value across all provinces and territories\n",
    "df.groupby(['NAICS']).sum()['2017'].sort_values(ascending=False).head(n=25)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## GDP vs Provinces\n",
    "pd.pivot_table(df,index=['Geography'], \n",
    "               aggfunc = sum, \n",
    "               values = ['2017','2016','2015','2014','2013']).sort_values(by=['2017'], \n",
    "                ascending=[False]).plot(kind='bar', figsize=(10,6))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## 2017 Single Largest GDP Sector Sorted\n",
    "pd.pivot_table(df,index=['NAICS'], \n",
    "               aggfunc = sum, values = ['2017','2016','2015','2014','2013']).sort_values(by=['2017'], \n",
    "                ascending=[False]).head(n=10).plot(kind='bar', figsize=(10,6))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## 2017 GDP vs Sector separated by Geography and sorted by maximum value\n",
    "pd.pivot_table(df,index=['Geography', 'NAICS'], \n",
    "               aggfunc = max, values = ['2017']).sort_values(by=['2017'], ascending=[False]).head(n=25)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## 2017 Individual Lines Items in the GDP breakdown of Geography X Sector\n",
    "pd.pivot_table(df,index=['Geography','NAICS'], \n",
    "               aggfunc = sum, values = ['2017','2016','2015','2014','2013']).sort_values(by='2017', ascending=False).head(n=10).plot(\n",
    "                kind='bar', figsize=(10, 6))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The GDP from `Alberta, Oil and gas extraction [211]` is comparable to the GDP from `Ontario, Real estate [531]`. The GDP from `British Columbia (66), Real Estate [531]` is about half the size of the contribution from Alberta Oil and Gas and Ontario Real Estate."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analyze Provincial GDP across Sectors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['Geography'].unique()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "provinces = ['Newfoundland and Labrador', 'Prince Edward Island', 'Nova Scotia',\n",
    "       'New Brunswick', 'Quebec', 'Ontario', 'Manitoba', 'Saskatchewan',\n",
    "       'Alberta', 'British Columbia (66)', 'Yukon',\n",
    "       'Northwest Territories', 'Nunavut']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "whichprovince = 'British Columbia (66)'\n",
    "pd.pivot_table(df.loc[df['Geography']== whichprovince],index=['NAICS'], \n",
    "               aggfunc = sum, values = ['2017','2016','2015','2014','2013']).sort_values(by='2017', ascending=False).head(n=20).plot(\n",
    "                kind='bar', cmap=\"viridis\", figsize=(16, 10), title= whichprovince)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "whichprovince = 'Alberta'\n",
    "pd.pivot_table(df.loc[df['Geography']== whichprovince],index=['NAICS'], \n",
    "               aggfunc = sum, values = ['2017','2016','2015','2014','2013']).sort_values(by='2017', ascending=False).head(n=20).plot(\n",
    "                kind='bar', cmap=\"viridis\", figsize=(12, 7), title= whichprovince)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "whichprovince = 'Alberta'\n",
    "pd.pivot_table(df.loc[df['Geography']== whichprovince],index=['NAICS'], \n",
    "               aggfunc = sum, values = ['2017']).sort_values(by='2017', ascending=False).head(n=15).plot(\n",
    "                kind='pie', subplots=True, title= whichprovince, legend=False, figsize=(6,6), autopct='%1.1f%%')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "whichprovince = 'Ontario'\n",
    "pd.pivot_table(df.loc[df['Geography']== whichprovince],index=['NAICS'], \n",
    "               aggfunc = sum, values = ['2017']).sort_values(by='2017', ascending=False).head(n=10).plot(\n",
    "                kind='pie', subplots=True, title= whichprovince, legend=False, figsize=(6,6), autopct='%1.1f%%')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "whichprovince = 'British Columbia (66)'\n",
    "pd.pivot_table(df.loc[df['Geography']== whichprovince],index=['NAICS'], \n",
    "               aggfunc = sum, values = ['2017']).sort_values(by='2017', ascending=False).head(n=10).plot(\n",
    "                kind='pie', subplots=True, \n",
    "                title= whichprovince, legend=False, autopct='%1.1f%%', radius=1, figsize =(6,6))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## This code creates a pivot table. Drag the column headings and play with the drop down menus to explore.\n",
    "from pivottablejs import pivot_ui\n",
    "pivot_ui(df)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
